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Abstract 

Due to its kinship with XPath, satisfiability issues for pro-positional 
dynamic logic (PDL) on trees in presence of some tree language have 
been extensively studied in the XML community. We describe two ap- 
plications of the same problem for model-theoretic syntax and compilers 
construction, where the tree language is the set of parse trees of some 
word according to a context-free grammar. This new variant is still com- 
plete for exponential time, but we explore the impact of natural gram- 
mar restrictions and establish complexities ranging from nondeterministic 
polynomial time to polynomial space in the relevant cases. 



1 Introduction 



Although originally motivated by applications in computational linguistics ( Kracht 



1995 Palm 1999 Afanasiev et al. 2005 1 , propositional dynamic logic on trees 



(PDL tree ) has been extensively studied in the XML community (Marx 2005 



Benedikt et al. 2008 ten Cate and Scgoufm 20101, where it is better known 



as Regular XPath. A prominent algorithmic problem in this context is the sat- 
isfiability of formulae in presence of a tree language, the latter being typically 
described by a DTD: given a formula ip and a tree language L, does there exist 
a tree t in L which is also a model of ipl Benedikt et al. (2008) comprehen- 



sively investigate this topic, and in some restricted cases the problem becomes 



tractable ( Montazerian et al. 2007 Ishihara et al. 2009). 



In this paper, we investigate a variant of the problem, where the tree lan- 
guage L we consider is a parse forest, i.e. the set of parse trees of a word w 
according to a context-free grammar G. More precisely, after recalling the for- 

we present in |Scction"3 two applications 



mal apparatus of PDL tree in Section 2 



of the parse forest model-checking problem (PFMC): 

• in computational linguistics, where we advocate a mixed approach for 



model-theoretic syntax (Pullum and Scholz 2001), with syntactic struc 



tures described by the conjunction of a grammar with a PDL tree constraint, 
and 



in compilers construction, where PDL tree formulae provide an compelling 



Kats et al. 2010 ) 



means for parser disambiguation (Thorup 1994 Klint and Visser 1994 



These applications motivate (1) practically relevant restrictions on the grammar 
G, which have no natural counterpoint in the XML literature, and (2) consid- 
ering the full logic PDL tree rather than the weaker PDL core fragment (aka Core 
XPath) employed in XML processing. 

We map the resulting complexity landscape for the problem in |Scction~4] 
Although the general case is ExpTiME-complete like the classical problem, our 
cases of interest have more affordable PSPACE-complete and NPTiME-complete 



complexities (see Figure 3 for a summary). A somewhat surprising corollary for 
model-theoretic syntax is that the recognition problem for PDL tre e7 i«e. whether 
there exists a tree model with the input word as yield, is PSPACE-complete if 
empty labels are forbidden — the best algorithms for this were only known to 



operate in exponential time ( Cornell 2000 Palm 2004 ) 



2 Propositional Dynamic Logic on Trees 



Propositional dynamic logic (PDL, see ( jFischer and Ladner 1979)) is a modal 
logic where "programs" — in the form of regular expressions over the relations 
in a frame — are used as modal operators. In the case of ordered trees (Kracht 



1995 


Afanasiev et al. 


2005 


ten Cate and Segoufm 


2010) 



the child relation J. between a parent node and any of its immediate children, 
and the right-sibling relation — > between a node and its immediate right sibling. 



2.1 Syntax and Semantics 

Formally, a PDL tre e formula tp is defined by the abstract syntax 



ip::=p\T 
7r ::= I | — ¥ 



-nip | (f A tp | ("K)tp 

TV ; 7T 7T + 7T \ TV 



(node formulae) 
(path formulas) 



where p is an atomic proposition ranging over some countable set AP — because 
we only deal with satisfiability questions, we can actually assume AP to be finite. 
We enrich this syntax as usual by defining box modalities as duals [n}tp = ^(7r)<yS 
of the diamond ones, inverses to the atomic path formulae as t == I 1 an d 
<— = — s- -1 , and boolean connectives _L = -iT, tpi V (p 2 = "'(""fx A ->(p 2 ), 

fx 3 ip 2 = -npx V ip 2 , and ipi = ip 2 = ((fx 3 <p 2 ) A (ip 2 D <px)- 

Formulae are interpreted over finite ordered trees t with nodes labeled by 
subsets of AP. Such a tree t is a partial function from the set N* of finite 
sequences of natural numbers to AP, s.t. its domain domi is (1) finite, (2) prefix 
closed, i.e. uv in dom t for some u, v in N* implies that u is also in dom t, and 
(3) predecessor closed, i.e. if ui is in dom t for some u in N* and i in N, then 
uj is also in dom t for all j < i in N. Such a tree can be seen as a structure 
97l t = (dom t, It, — >t, t) with 

i* = {( w j M *) I u ^ £ dom t} — H = f {(ui, u(i + 1)) | u(i + 1) G dom t} . 
We define the interpretations of PDL tre e formulae over t inductively by 

[T]t d =donU [-.?]* = dom t\Mt |p] t d ^ f { u g dom t | p = t(u)} 

hi a if.lt = W, n iMt l{ir)vh = MTHlvh) l^h = {(«,«) I u e y\ t } 



= U Hi = ->t hi ; Trait = hih ; 



I* = w 



b" 1 ]* d = f Mr 
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Observe that these are sets of nodes included in dom t in the case of node 
formulae, but binary relations included in dom t x dom t in the case of path 
formulas; thus [71-]* denotes the reflexive transitive closure of [7r] t and fnj^ 1 its 
inverse, while [7Ti]j ? [7r 2 ]t denotes the composition of the two relations lnij t 
and J7T 2 | t . A node u in domt satisfies ip, noted t, u \= ip, if u is in [<p]t- A tree 
t satisfies tp, noted t \= <p, if its root e satisfies (p; we let ftp] = {t \ t \= tp} be 
the set of models of <p. 

Example 1 (Basic Navigation) . Several simple formulae helping navigation can 
be defined: root = ->(t)T holds only at the root, leaf = f ^(|)T only at a leaf 
node, first = ->{<— }T at a leftmost one, and last = f (— >}T at a rightmost one. 

We can also define the first-child relation ^/ = J,; first?, and conversely 
express the child relation as \ = v/;— >■*: this shows that we could work on 
binary tree models instead of the unranked ones we used in our definitions. 



Example 2 (Parse Trees (Blackburn et al. 1993)). Recall that a context-free 
grammar (CFG) is a tuple G = (N, E, P, S) composed of a finite nonterminal 
alphabet N, a finite terminal alphabet S disjoint from N and forming a vocab- 
ulary V = iVtfcl E, a finite set of productions P C N x V* , and an axiom S € iV. 
We denote the empty sequence by e and write E' = EU {e} and V = F W {e}. 

Given a context-free grammar G, its set of parse trees forms a local tree 
language, which can be expressed as \(Pg} for a PDL tree formula <pc with V 
as set of atomic propositions. First define a path formula n a that defines a 
sequence of sibling nodes labeled by a iiaV*: 



: 7T„ 



:r a = < X?; last? 



dof „ 

<PG — b 



\{ a = Xa' ,X e V,a' =£e, 
if a = X e V , 
e?; last? otherwise, i.e. if a = e . 

(the root is labeled by S) 
A [4-*] (leaf = \/ a (leaves are terminals and internal nodes nonterminals) 

A /\ AD V (/^T). 

AgV A->q 

2.2 The Conditional Fragment 

We will consider in this paper sever al fragments of PDL tre e; most importantly 
the conditional path fragment PDL cp 
syntax on path formulae 



(productions are enforced) 



(Palm 



1999 



Marx 2005 ), with a restricted 



:= a f ; i 



7r + 7r | tp? (a; ip?)* (conditional paths) 

a ::— <— \ — s> | t | i • (atomic paths) 

This fragment is of particular relevance, because it extends the core language 



PDL core ( |Blackburn et al.[ |1996| |Gottlob and Koch[ |2002[ ) (which features a* 
instead of (a;<p?)*) and captures exactl y first - order logic over finite ordered 
trees with the two relations — > + and | + (Marx 2005). 



Example 3 (General Successor). Observe that the formulae in examples [T] and [2] 
are actually in PDL core . The general successor relation ~< = (last?; '[)*; — >; (J, 
; first?)* is an example of a path that is not definable in PDL core — this can be 
checked for instance using an Ehrenfeucht Frai'sse argument. 
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We denote by PDL tree [|] (rcsp. PDL cp [|], PDL core [|]) the fragments with 
only downward navigation, i.e. without the —>,•<—, and f atomic paths. 



3 Model- Checking Parse Forests 

Many problems arising naturally with PDL tree are decidable, notably the 

model-checking problem: given a tree t and a formula ip, does t \= p>1 This 
is known to be in PTime even for larger fragments of PDL ( |Lange| 2006 1. 



satisfiability problem: given a formula ip, does there exist a tree t s.t. t 



ip< 



This is known to be ExpTiME-complctc (Afanasiev et al. 2005). 



In the context of XML processing and XPath, an intermediate question between 
model-checking and satisfiability also arises: 

satisfiability in presence of a tree language: given a formula <p and a reg- 
ular tree language L, does there exist a tree t E L s.t. t \= pi 

Due to its initial XML motivation, the basic case for this problem is that of 
a PDL core [|] formula (a downward Core XPath query) and of a local tree lan- 



guage (described by a DTD), but many variants exist (Benedikt et al. 2008 



Montazerian et al. 2007[ [ishihara et al. 2009). Our own flavour is motivated 



by applications in computational linguistics and programming languages, where 
the tree language is the set of parse trees of a word w in E* according to a CFG 
G = (N, S, P, S) verifying V = S W N W {e} = AP. 



More precisely, following a well-known construction of Bar-Hillel et al. ( 1961 ), 
if w = ai ■ ■ ■ a n is a word of length n, the set of parse trees or parse forest of a 
CFG G for w, written Lq^ w , is the regular tree language recognized by a tree 
automaton Ag.w with state set 

Qg,w = {(i,X,j) | < i < j < n, X G V'} , 

alphabet V', initial state (0,S,n), and rules 

So,™ = {(io,A,i m ) — > A((io,X\,ii) ■ ■ ■ (i )) | A -+ Xi ■ ■ ■ X m € P A 0<i < • • • <i m <n} 

U {(i,a i+1 ,i + 1) -> a i+1 () < i < n} U {(i,e,i) ->• e() | < i < n} . 

Intuitively, a state (i, X,j) of this automaton recognizes the set of trees derivable 
in G from the symbol X and spanning the factor a i+1 • • • a rj of w. 

Parse Forest Model-Checking Problem (PFMC). 

input a context free grammar G, a word w, and a PDL tree formula ip, 

question does there exists t £ Lq w s.t. t \= pi □ 



Note that the automaton Aq,w has size 0(|G| 



|m+l 



) if m is the maximal 



length of a production rightpart in G; since the grammar can be put in quadratic 
form (corresponding to the binarization we would also perform on the formula) , 
this typically results in size 0(|G| • \w\ 3 ). Therefore, although a tree automaton 
for the tree language is not part of the input, it can nevertheless be constructed 
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in logarithmic space. The originality of the problem stems from considering 
parse forests, which form a rather restricted class of tree languages. 



In Section 4 we will investigate the complexity of this problem, and focus on 
the influence of the acyclicity and e-frecness of G: Define the derivation relation 
=>• between sequences in V* by (3Aj f3aj iff A — > a is a production of G and 
/?, 7 are arbitrary sequences in V* . A CFG is acyclic, if none of its nonterminals 
A allows A => + A. A CFG is e-free, if none of its productions is of form A e 
for some nonterminal A. 

In the remainder of this section, we motivate the problem by considering ap- 



( Section 3.2 1 



plications in computational linguistics (Section 3.1) and compilers construction 



3.1 Application: Computational Linguistics 

In contrast with many formal theories of syntax that describe natural lan- 
guage sentences through "generative-enumerative means" , Pullum and Scholz 



(2001) champion model-theoretic syntax, where the syntactic structures (typi- 
cally, trees) of a natural language are the models of some logical formula. They 
point out interesting consequences on theories of syntax, but here we thoroughly 
betray the spirit of their work in exchange for some practicality. 

Indeed, the usual approach to model-theoretic syntax would be to describe 
a language through a huge formula tp of PDL tre e or monadic second-order logic 
(MSO) on trees. Checking whether a given sentence w can be assigned a struc- 
ture then reduces to a recognition problem on a tree automaton A v of expo- 



nential (for PDLtree) or non-elementary (for MSO) size (Cornell, 2000). 



A Mixed Approach. We consider a pragmatic approach, where 

• a CFG describes the local aspects of syntax, e.g. that a canonical transitive 
French sentence can be decomposed into a noun phrase acting as subject 
followed by a verb kernel and an object noun phrase corresponds to a 
production S -> NP VNNP, while 

• long-distance dependencies and more complex linguistic constraints are 
described through PDL tree formulae. 

Example 4 (French Clitics). A toy grammar for French sentences with pred- 
icative verbs like "dire" or "demander" could look like (in an extended syntax 
where XI describes zero or one occurrences of symbol X): 

S -> NPsuj? VN VPinfobj? PPaobj? d -» la 

NPsuj ->dn n — > philosophe 

VN — > clsuj? clobj? claobj? v v — > demande \ reflechir 

VPinfobj — > de VN clsuj — > elle 

PPaobj ->• a NP clobj -» le 

claobj — > lui 

Such predicative verbs have a mandatory object and subject, and an optional 
indirect object. But all three canonical arguments can be replaced by clitics in 
the verb matrix VN. This grammar fragment generates reasonable sentences 
like 



■5 



NP suj VN 
/ \ / I \ 

d n cl obj cl aobj v 

I I I I I 

la philosophe le lui demande 



(a) Syntax tree according to Example [3] 

s 



NP suj " VN 
/ \ / I \ 

d n cl obj cl aobj v 

I I I I I 

la philosophe le lui demande e 



VP inf obj PP aobj 



(b) Analysis with moved constituents. 

Figure 1: Syntax trees for "la philosophe le lui demande." 



La philosophe demande de reflechir. (The philosopher asks to think.) 
La philosophe le lui demande. (The philosopher asks it to her.) 

where the "le" clitic acts as direct object and "lui" as an indirect one (see 
Figure la for an example syntax tree). It also generates ungrammatical ones 

mr^ — 

* Elle le lui demande de reflechir. (She asks it to her to think.) 

* demande. (asks.) 

where there are duplicated or missing arguments. 

Instead of refining the grammar (which might prove impossible, for instance 
if it was automatically extracted from a treebank, i.e. a set of sentences anno- 
tated with syntactic trees), we can filter out the unwanted trees using a PDL tree 
formula. To improve readability, we take symbols like "VPinfobj" or "clsuj" to 
denote sets of atomic propositions, respectively {VP, inf, obj} and {cl, suj} in 
this instance, and refine our grammar with the following formula: 



[l*]demande D (((f; f; -H» + ) + (f; «- 

A((t;t;H + (t; 



; cl?))obj (at least one object) 

" l ";cl?))suj (at least one subject) 

- + ;cl?)/D-<t;t; (^ + ^ + )}/) 



a A (t; 

/ 6{suj, obj, aobj} 

(a clitic argument forbids the corresponding canonical argument) 



Interestingly, such PDL tree constraints can easily be tested against tree cor- 
pora to check their validity; see (Lai and Bird 2010 1 on using PDL tree -like query 



languages to this end. We checked that the above PDL tree formula was satis- 



fied by the trees in the Sequoia treebank ( Candito and Seddah 2012[ ) (using an 
XPath processor). 
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Discussion. In this approach, the CFG can be a very permissive, over- 
generating one, like the probabilistic grammars extracted from treebanksj^] since 
it is later rchncd by the PDL tree constraints. We are not aware of any linguistic 
rationale for cycles in CFGs; on the other hand, e-productions are sometimes 
used as placeholders for moved constituents. However, in such analyses, the 
moved constituent and the placeholder are coindexed, i.e. related through an 
additional relation, which 

• requires a richer class of models than mere trees over a finite alphabet if 



we want to make the coindexation explicit (see Figure lb for an example), 
and 



• can be simulated by a PDL tre e formula, as seen with the connection we 
establish between a clitic and the corresponding missing argument in Ex- 
ample [4] 

We therefore expect our grammars to be both acyclic and e-free — and we could 
check that this was indeed the case on the three rather different CFGs proposed 
by |Moore (2004) for natural language parsing benchmarks. 

On the logical side, it seems necessary to be able to use e.g. general successors 
(recall Example [3]). |Palm| ( |1999| ) and |Lai and Bird| poTo| argue that PDL cp 
provides an appropriate expressiveness for linguistic queries. 



3.2 Application: Ambiguity Filtering 

Ambiguities in context-free grammars describing the syntax of programming 
languages are a severe issue, as they might lead to different semantic interpre- 
tations, and complicate the use of deterministic parsers — they basically require 
manual fiddling. They are also quite useful, as they allow for more concise 
and more readable grammars, and it is actually uncommon to find a language 
reference proposing an unambiguous grammar. 

A nice way of dealing with ambiguities at parse time is to build a parse forest 



and filter out the unwanted trees (Klint and Visser 1994 1. In contrast with tin 



kering with parsers, this allows to implement the "side constraints" of language 
references as declarative rules, which, beyond readability and maintainability 



concerns (Kats et al. 2010), also enables some amount of static reasoning and 
optimization. 

Example 5 (Dangling Else). We propose to use PDL tre e formula? to filter out 
unwanted parses. Consider the following regular tree grammar for statements^] 

S st(if C then S) | se(if C then S else S) j sw(while C S) | ss(skip) 
C — > ct(true) I cf(false) 

Feeding this grammar to a LALR(l) parser generator like GNU/bison, we find 
a single shift /reduce conflict, where the parser has a choice on inputs like "if 
true then if true then skip else skip" , upon reaching the "else" symbol, between 



^Moorel |2004| | finds an average of 7.2 X 10 27 different parse trees per sentence with a 
grammar extracted from the Penn treebank! 

2 We use a regular tree grammar in a restricted way to label internal nodes differently de- 
pending on the chosen production; this allows for a simpler PDL tr ee formula but has otherwise 
no impact as the language remains local. 
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se 

/ \ ?:;: ^^ 
ct then ss else ss 
I I I 

true skip skip 

(a) Parse when preferring shift over reduce. 




(b) Parse when preferring reduce over shift. 

Figure 2: Two parses for the ambiguous input "if true then if true then skip 
else skip" with the grammar of Example [5j 



reading further (Figure 2a), and reducing first and leaving this else for later 



(Figure 2b). The usual convention in programming languages is a greedy one, 
where shift is always chosen. However, disambiguation by choosing between 
shift or reduce parsing actions is error-prone, and there are cases where both 



alternatives are incorrect on some inputs (see ( Schmitz 2010 1 for an example 
in Standard ML). 

A PDL tree formula that accepts the desired tree of |Figurc 2a| but rejects the 
one of Figure 2b] should check that no "else" node can be a general successor 
(in the sense of Example [3]) of an "st" node: 

-i(4-*} (st A (^)else) . 

Observe that a general successor path -< is really needed here, because the "st" 
node can be at the end of an arbitrarily long sequence of "sw" nodes from nested 
"while" statements. 



A very similar approach was proposed by Thorup ( 1994 ) , who used simple 
tree patterns for similar purposes. Both tree patterns and PDL tree formulae 
can be compiled into the grammar, so that only the desired trees can be gener- 



ated, allowing to use deterministic parsers or ambiguity checking tools (Schmitz 
2010). PDLtree formulae are strictly more expressive than patterns; the dangling 



else example required an involved extension of patterns in (Thorup |1996| ) 



Discussion. The grammars used for programming languages are always 
acyclic — tools like GNU/bison will detect and reject cyclic grammars — but e- 
productions are fairly common. 

On the logic side, PDL cp is required in order to express general successor 
paths as in Example [5j This seems expressive enough for most tasks, but lay- 
out sensitive syntax would be beyond its grasp: in programming languages like 
Haskell or Python, the indentation level is used to delimit statement blocks — 
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ExpTlME-complete 



PSPACE-complete acyclic e-free 

NPTlME-complete acyclic, e-free 

Figure 3: The complexity of the PFMC problem, depending on the grammar 
characteristics. 



differentiating between possible parses then requires some limited counting ca- 
pabilities, or at least infinite label sets. 

Excluding a tre e considered individual ly is one approach among others to 



ambiguity filtering (Klint and Visser 1994). A popular alternative considers the 
parse forest, i.e. the tree automaton Ag,w itself. The ambiguity resolution of 
Example [5] on the input "if true then if true then skip else skip" can be simply 
stated as a preference st > se implying that the rule 

(0,5,9) -> st((0,if, l)(l,C,2)(2,then,3)(3,5',9)) 
is preferred over the rule 

(0, S, 9) -> se((0, if, 1)(1, C, 2)(2, then, 3)(3, 5", 7)(7, else, 8)(8, S, 9)) 

in the automaton Ag,w Such disambiguation rules are easy to write, but they 
are also inherently dynamic: they cannot be compiled into the grammar, because 
whether the rule will be triggered depends on whether an ambiguity appears 
there — an undecidable problem. 

4 Complexity Results 

We investigate in this section the complexity of the parse forest model-checking 
problem. We obtain a classification of complexities depending on the properties 



of the grammar (see Figure 3 ) . Interestingly, our hardness results always hold 
for a formula (p in the rather restricted fragment PDL core [I] , and generally hold 
already for fixed G and/or w. These bounds use logarithmic space reductions. 
Turning first to the complexity in the general case, an immediate conse- 



quence of classical results in the field (e.g. Calvanese et al. 2009 Theorem 7) 
is that it lies in ExpTime. 

Proposition 1. PFMC is in ExpTime. 

Proof Sketch. We can assume G to be in quadratic form and if to work on 
binary trees that encode unranked trees with the ^/ and — > relations, as these 
transformations only incur a linear cost. Then, construct the tree automaton 
Ag,w of size 0(|G| • \w\ 3 ) that recognizes the set of parse trees of w in G and the 
tree automaton A v of size 2^1^) for a polynomial p that recognizes the models 
of ip: it suffices to test the emptiness of their product automaton, which can be 
performed in time linear in \G\ ■ \w\ ■ 2 P ^ V ^ for a polynomial p. □ 

An interesting consequence of the proof of Proposition [T] is that the PFMC 
problem is PTiME-complete when the PDL tree formula is fixed, pleading for 
using small formula? in practice. 
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Our proof for Proposition [T] does not benefit from the specificities of the 
PFMC problem: any satisfiability problem in presence of a tree language would 
use the same algorithm. Therefore, we might still hope for the existence of a 
more efficient solution, but adapting the proof of ExpTiME-hardness for PDL 
satisfiability from (IBlackburn et al. 20011, we obtain: 



Proposition 2. PFMC is ExpTiME-Ziard, even for fixed G and w and for <p 
in PDL core [I] . 

Proof Idea. We reduce from the two-players corridor tiling game of Chlebus 



( 1986 ) . We fix w = e and also fix G to generate a parse forest encoding game 
trees; we use a PDL core [4,] formula tp to check that there exists a winning strategy. 
See Appendix [A] for details. □ 

As can be seen from this proof idea, the fact that w — e and G is cyclic plays 
an important role, because the parse forest is essentially unconstrained. This 
is a good incentive to examine what happens when G is acyclic and/or e-free, 
especially since those cases are most relevant for the applications we described 
in lScction 31 

4.1 The Acyclic e-free Case: Mixed Model-Theoretic Syn- 
tax 

Let us therefore consider the other end of our spectrum, which we claimed 
was of particular relevance for the mixed approach to model-theoretic syntax 



we presented in Section 3.1 if G is acyclic and e-free, then Aa, w is a non- 
recursive tree automaton generating a, finite parse forest, albeit it might contain 
exponentially many trees. This yields an ExpTime algorithm that performs 



PDL tree model-checking (in Ptime (Lange 2006)) on each tree individually. 



We can try to refine this first approach and resort to (Benedikt et al. 2008 
Lemma 7.5), which entails that the problem for the PDL core fragment is in 
PSpace, but we can do a bit better: 

Proposition 3. PFMC with acyclic and e-free grammars is NPTiME-complete; 
hardness holds even for fixed G and for ip in PDL core [\] . 

Proof Idea for the Upper Bound. We show that the parse trees in L(Ag,w) are 
of polynomial size in \G\ and \w\, hence we can nondeterministically guess such 
a tree and check that it is a model of ip in polynomial time. See Appendix |B.1| 
for details. □ 

Proof Idea for the Lower Bound. We reduce from 3SAT with a fixed grammar 



G and a PDL core [4.] formula ip; see Appendix B.2 for details. □ 



4.2 Non-Recursive DTDs 

Let us turn now to the more involved cases where G is either acyclic or e-free: 
we rely in both cases for the upper bounds on the same result that extends 



Lemma 7.5 of Benedikt et al. (20081 to handle PDL tree instead of PDL. 



Proposition 4. Satisfiability o/PDL tree in presence of a non-recursive DTD is 
PSPACE-complete. 
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Let us recall that a document type definition (DTD) is a generalized CFG 
D = (N,P,S) where P is a mapping from N to content models in Reg(N*) 
the set of regular languages over N — we will assume these content models to be 
described by finite automata (NFA). Given D, the derivation relation => relates 
f3A"/ to f3aj iff a is in P(A); a DTD is non-recursive if no nonterminal has a 
derivation A => + (3Aj for some (3, 7 in N* . Note that a non-recursive DTD 
might still generate an infinite tree language, but that all its trees will have a 
depth bounded by \N\. 



Proof Idea for Proposition^ The hardness part is proven by [Bencdikt et al. 



(2008) in their Proposition 5.1. 



For the upper bound, we reduce to the emptiness problem of a 2-way alter- 



nating parity word automaton, which is in PSpace (Serre 2006). The key idea, 



found in |Benedikt et al.| s work, is to encode trees of bounded depth as XML 
strings (i.e. with opening and closing tags): both the DTD D and the formula ip 
can then be encoded as alternating parity word automata Ad and A v of poly- 



nomial size. Our construction for A v is a bit different from that of Benedikt 



|et al.| for instance, we cannot assume it to be loop-free. See Appendix |C| for 
details. □ 

4.2.1 Acyclic Case: Ambiguity Filtering 

We are now ready to attack the case of acyclic grammars. This restriction is 
enough to ensure that the parse forest is finite, and, more importantly, Ag,w is 
trivially non-recursive, thus Proposition [4] immediately yields an PSpace upper 
bound. In fact, this is optimal: 

Proposition 5. PFMC with acyclic grammars is PSPACE-complete; hardness 
holds even for fixed w and for if in PDL core [4.] . 

Proof Sketch. Because G is acyclic, for any w, the trimmed version of Ag,w is a 
non-recursive tree automaton. Indeed, in this automaton, if a state (ig, A, i^) of 
Ag,w rewrites in n steps into atree t with leaves labeled by (io, X%, ii) ■ ■ ■ (ifc-i, Xk, 
then A =>" Xi ■ ■ ■ X^ in G. If the automaton is trim, then the existence of a 
state (i, B,j) implies that B derives the factor a i+1 • • • a,j of w. Thus, if (i, A,j) 
were to rewrite in at least one step into a tree C[(i, A,j)], then there would 
be a cycle A => + A in G, a contradiction. It remains to relabel the rules 
(i, A, j) — > A(qi ■ ■ ■ q rn ) of Ag,w to (i, A, j) — > (i, A, j)(qi ■ ■ ■ q m ) to obtain a local 
non-recursive tree automaton, which is just a particular case of a non-recursive 
DTD, and interpret the propositions p in V' = AP as Vo<j<j<n(*'^'-?) over 
Qg,w in tp to apply Proposition [4] and obtain the upper bound. 

The lower bound holds for the PDL core [I] satisfiability problem in presence of 



non-recursive and no-star DTDs ( |Benedikt et al. 2008 Proposition 5.1), which 



is easy to reduce to our problem by simply adding e-leaves in the DTD; this 
lower bound thus already holds for a fixed w = e. □ 

4.2.2 e-Free Case: PDL tree Recognition 

We reach the last case of our study. Due to cycles, an e-free grammar G can have 
infinitely many parses for a given input string w, and its parse trees unbounded 
depth. Nevertheless, recursions in a parse forest of an £-free grammar display a 
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particular shape: they are chains of unit rules qi — > Ai(qi + i). The key idea here 
is that such chains define regular languages of single-strand branches, which can 
be encoded in a non-recursive DTD by "rotating" them, i.e. seeing the chain 
as a siblings sequence instead of a parents sequence, taking advantage of the 
DTD's ability to describe trees of unbounded rank. 

Proposition 6. PFMC with e-free grammars is in PSpace. 

Proof Idea. The algorithm starts by constructing Ag,w in polynomial time on 
binarized trees; we want to reduce the problem to the satisfiability problem for 
PDL tree in presence of a non-recursive DTD and use Proposition |4j As in the 
proof of Proposition [5j we consider "localized" rules q —> q{q\ q2) of Ag,w, and 
replace them by productions of the form q — s- chains(gi)chains(<72) where the 
chains(gj) are the languages of single chains out of qi. By suitably labeling our 
trees, we can interpret ip over those transformed trees. See Appendix |D.1| for 
details. □ 

Proposition [6] is optimal: 

Proposition 7. PFMC with e-free grammars is PSPACE-hard, even for fixed 
G and w and for ip in PDL core [I] . 

Proof Idea. The proof is by reduction from membership in a linear bounded 
automaton. We fix w = a for some symbol o of S, and also fix the CFG G to 
basically generate any single-strand tree with a root S and a leaf a over a fixed 
alphabet. A PDL core [4] formula of polynomial size then checks that this tree 
encodes an accepting run of the LBA. See Appendix |D . 2 1 for details. □ 

PDLtree Recognition. A key question if model-theoretic syntax is to be 
used in practice for natural language processing is the following recognition 
problem: 

PDLtree Recognition Problem. 

input a PDLtree formula ip, a word w in AP*, and a distinguished proposition 
s in AP, 

question does there exist a tree t with yield w and root label s s.t. t\= <p7 □ 

Note in particular that the statement of the problem excludes e-labeled 
leaves, which would require a different formulation and would yield an Exp- 
TiME-complete problem. 

The previous approaches to the recognition problem have used tree automata 



techniques (e.g. Cornell 2000) or tableau-like techniques (Palm 2004). In both 
cases, exponential time upper bounds were reported by the authors — to be fair, 
these algorithms solve the parsing problem and find a representation of all the 
parses for w compatible with ip — , but we can improve on this thanks to Propo- 
sition [6j 

Corollary 1. PDL tre e recognition is PSPACE-complete; hardness holds even for 
fixed w and for ip in PDL core [I] . 
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Proof Sketch. The lower bound stems from an easy reduction from Proposi- 
tion [7} we can encode the grammar G into a PDL core [j,] formula <po as in 
Example [2] and reduce to the parsing problem for w and <p A <pc- 

For the upper bound, we can assume as usual ip to work on a binary encoding 
of trees. The idea is to reduce to the PFMC problem with a "universal" CFG 
that accepts all the trees of rank at most 2 over AP. A smallish issue is that we 
need to separate between nonterminal and terminal labels, but we can create a 
disjoint copy Af= f {P|peAP}ofAP and interpret ip as a formula over AT WAP 
with P V p as the interpretation of p. This grammar has then S as axiom and 
productions A —> X Y and A —> X for all A in N and X,Y in V, and we can 
resort to Proposition [6] to conclude. □ 



5 Conclusion 

Because PDL tree formulae can freely navigate in trees, properties that rely on 
long-distance relations are convenient to express, in contrast with the higly local 
view provided by a grammar production. However, this expressiveness comes 
at a steep price, as complexity problems on PDL tree are typically ExpTime- 
complcte instead of PTiME-complete on CFGs. 

The PDL tree model-checking of the parse trees of a CFG allows to mix the 
two approaches, using a grammar for the bulk work of describing trees and using 
more sparingly a PDL tree formula for the fine work. We argue that this trade- 
off would find natural applications in computational linguistics and compilers 
construction, where sensible restrictions on the grammar lower the complexity 
to NPTime or PSpace. 

An additional consequence is that the recognition problem for PDL tre e is in 
PSpace. This is a central problem in model-theoretic syntax, and this lower 
complexity suggests that "lazy" approaches, in the spirit of the tableau con- 



constructions of Cornell (2000) 



struction of Palm (2004), should perform significantly better than the automata 
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A General Case 



PDL satisfiability is known to be ExpTiME-complete in general (Fischer and 



Ladner 19791. The general case of the parse forest model-checking problem, 
i.e. when G is an arbitrary grammar, is also ExpTiME-complete. The upper 



bound follows from classical techniques for the upper bound (Calvanese et al. 
2009 ) — see the proof sketch of Proposition [l] 



The lower bound could be proven by a reduction from PDL satisfiability 
using a "universal" CFG as in the proof of Corollary [l] However, this proof 
does not lend itself very easily to the restricted case we want to consider, where 
w and G are fixed and ip is a downward PDL core [|] formula. We present in this 
section a reduction from the two-player corridor game, which is known to be 



ExpTiME-hard (Chlebus 1986), adapted from a similar proof for the hardness 



of PDL satisfiability by Blackburn et al. (2001 Theorem 6.52). 



Two Player Corridor Game sees two players, Eloise and Abelard, com- 
pete by tiling a corridor. The tiles are squares decorated by s + 2 different 
patterns T = {to, . . . , t s +i}; two binary relations U and R over T tell if a tile 
can be placed on top of the other and to the right of the other. Two tiles are 
distinguished: to is called the white tile and t g +i the winning tile. The corridor 
is made of n + 2 columns of infinite height, with the first and last columns filled 
with white tiles to and delimiting n columns for the play. The initial bottom 
row is tiled by a sequence I\ ■ ■ ■ I n of tiles, which is assumed to be correct, i.e. 
to respect the R relation. 

The players alternate and choose a next tile in T and place it in the next 
position, which is the lowest leftmost free one- thus the chosen tile should match 



the tile to its left (using R) and the tile below (using U) — ; see Figure 4a Eloise 
starts the game and wins if after a finite number of rounds, the winning tile i s _|_i 
is put in column 1. Given an instance of the 2-players corridor tiling game, i.e. 
(s + 2, Ii ■ ■ ■ I n , R, U), deciding whether Eloise has a winning strategy, i.e. a way 
of winning no matter what Abelard plays, is ExpTiME-complete. 



Notation. We represent strategy trees as parse trees. Our PDL core [4] for- 
mula ip will ensure that the parse tree is indeed a valid game tree, and that it 
encodes a winning strategy for Eloise. 

A game turn is encoded locally by an X-labeled node and its immediate 
children, with the next reachable configurations reachable through a path of 
M -labeled nodes. More precisely, each X node has the following children (see 
Figure 4b): 

• a node labeled either W or L, stating that the configuration is winning or 
not for Eloise, 

• a node labeled either E or A, stating whether it is Eloise's or Abelard's 
turn to move, 

• a chain of i P-labeled nodes, stating that the current playing column is 

G%, 

• a chain of j + 1 T-labeled nodes, stating that the chosen tile at this turn 

is tj, 
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Eloise's turn, move 
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Co Ci C2 Ci-i Ci Ci+i 

(a) One of Eloise's turns in the game. 




(b) The tree encoding of the turn. 

Figure 4: A turn of the 2-players corridor game and its tree encoding. 



• a comb-shaped subtree of n+2 nodes labeled by C, describing the contents 
of the top layer in the corridor (which in general spans two rows), with a 
strand of k + 1 T-labeled nodes telling for each column that tile tk is on 
top, 

• a chain of m nodes labeled or 1, encoding in binary the number of 
the moves made so far; this chain does not need to be longer than m ^ f 
|~log(2n s+3 )] — or some move would have been repeated. 



The Grammar. We fix w = e and G = (N, 0, P, X) over the nonterminal 
alphabet N = {X, M, W, L, E, A, P, T, C, 0, 1} with productions (with a slightly 
extended syntax with alternatives built-in the productions right-hand sides): 

X -> M(W | L) (E | A)PTC{0 | 1) M^X\XM\e C^T\TC 

P^P\e T^T\e W -> e 

L -> e E -> s e 

->• | 1 | e 1 ->• | 1 | £ 

Observe that all the tree encodings of strategies are generated by G, but that 
not all the trees of G encode a strategy: for instance, the number of C's might 
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be different from n + 2, or the described tiling might not respect the placement 
constraints, etc. The formula will check these conditions. 

Game Structure and Mechanics. We use the non-terminal labels and 
e as atomic propositions in our PDL tree formula. Because there are at most s + 2 
choices of tiles at each turn, we can define the path 

s+2 

move d =' ]T(|;M?r; |; X? 
i=l 

that relates two successive configurations. Further define the following formulae 
for < i < n + 1, < j < s + 1, 1 < a < m and b in {0, 1}: 

p(i) d = f ((|;P?)M>£ t(i) d = f ((i;T?y +1 ;i)e 

q(a,b) d = f ((|;(0+l)?) a )6 c(i,j) == f ((I; C?) i+1 ; (|; T?) j+1 ;l)e 

In an X node, p(i) holds if the current column is Ci, t(j) if the chosen tile is 
Tj, c(i, ki) if tile T k . is on top of column C i; and q(m, b) if the binary encoding 
of the current move number has bit a set to b. 

We ensure some preliminary structure on the game tree: At the start of the 
play, the current player must be Eloise, and the referee should have placed the 
initial tiles in the first row. The counter must be initialized to zero. 

n m 

pi = X A ((i)E) A p(l) A f\ c(i, I,) A f\ q(a, 0) . 

i— 1 a— 1 

In every state, the tiles in columns and n + 1 should be the white tile to- 

^ = [r]XDc(0,0)Ae(n + l,0) . 

In every state with current column i, the next move should be at position 
(i mod n) + 1. 

n 

= /\[r](^Ap(i)) D [move] p((t mod n) + 1) . 

i=l 

The columns are updated with the correct tiles. 

n s + 1 

^ = A A ^ x A p(0 A D i move ] c (^i) 

i=l j=0 
n s + 1 

A A /\(X Ap(i)Ac(j,k)) D [move]c(j,fc) . 

i^j = l fe=0 

Players alternate. 

cp 5 ^{l*](XA(l)E^{ m ove}(l)A). 
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The chosen tiles verify the adjacency constraints: define for this the Boolean: 
adj(«, j, k, £) = ti U tj A (i > D t k Rtj) A (i = D t Rtj) A (i = n D t 3 R t ) 

n s + 1 

^6= A A [i*](X/\p(i)/\t(j)Ac(i-l,k)Ac(i,£))Dadj(i,j,k,£) 
i=i j,k,e=o 

The counter is incremented. 

m d— 1 m 

^=AA A [4.*](^Aq(o,6)Aq(d,0)A A iM)) 

d=l o=l &£{0,1} e=d+l 



D [move] (q(a, 6) A q(d, 1) A A ^M)) 



e=ci+l 

Winning Strategy. The previous formulae were making sure that the tree 
would be a proper game tree. We want now to check that it describes a winning 
strategy for Eloise: We should check that all the possible moves of Abelard are 
tested: 

<p s = /\ A [\*}{X Ap(i)A(i)EAt(k) Ac((imodn) + l,£)Aadi(i,j,k,£)) D (move)t(j) 

i=l j,k,i=0 

Finally, the winning condition should be met: 

^ d ='((IWA[|'](lA(|WD(c(l, S + l) 

(the game is immediately winning) 

V A ((move; |) WO) 

(Eloise can win later) 

V ((l)A) A ((move)T) A [move](|)H^)) . 
(None of Abelard's moves can prevent Eloise from winning) 

Finally, our final PDL core [4,] formula is ip = /\j =1 (fli- Because G and w 
are fixed and ip can be computed in space logarithmic in the size of the game 
instance, we have therefore shown the general PFMC problem to be ExpTime- 
hard. 

B Acyclic and £-Free Case: Proposition [3] 

We prove here Proposition [3j the PFMC problem is NPTlME-complete for 
acyclic and e-free grammars. 

B.l Upper Bound 

Proof. Let us show that the parse trees in L G ^ W are of polynomially bounded 
size. The NPTime algorithm then guesses a tree in Lg, w and checks that it is 



a model in polynomial time ( Lange 2006[ ) 
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Claim 9. Let G = (N, E, P, 5") be an acyclic and e-free CFG. Let w £ E*. Any 
parse tree f in Lg,u> has at most |7V|(|w| — 1) + \w\ nodes. 

Consider the run of Ao, w ° n t'- each node of t is labeled by a state (i,A,j) 
describing two positions 0<i<j<ninw and a nonterminal A in N. Because 
G is e-free, we know that i < j. We claim that the set of nodes labelled with 
positions forms a connected chain. 

To see this, suppose two nodes a and b are both labelled with positions (i, j). 
Suppose first that neither a nor b is an ancestor of the other. Let then c ^ {a, b} 
be their least common ancestor (lea) : c must have at least two children, and its 
children will be labelled with non-overlapping positions — recall that i < j. Only 
one of these non-overlapping intervals can contain the interval The child 

corresponding to that interval would then be the lea of a and b, in contradiction 
with c being their lea: hence one of a or 6 is the lea of a and b. 

Suppose now wlog. that a is an ancestor of b. Observe that a descendant of 
a would be labelled with a sub-interval of and an ancestor of b would be 

labelled with a super-interval of This forces every node in the path from 

a to b to be labelled with Hence, the nodes labelled with form a 

connected chain. 

Since G is acyclic, each chain of nodes (i, A\,j), (i, A 2l j) •••(«, A p ,j) having 
the same positions cannot have a non-terminal Ak occuring twice, or the 
grammar would allow a cycle. Therefore, each such chain will have at most 
\N\ nodes. We can "collapse" these chains to form a tree where each 
pair appears at most once, and every node (except the leaves) has at least 
two children. Since there are exactly \w\ leaves (G is e-free), there can be at 
most \w\ — 1 internal nodes in such a tree. We obtain that there were at most 
|./V|(|w| — 1) internal nodes in the original parse tree, i.e. at most |7V|(|w| — l) + |w| 
nodes in the full parse tree. □ 

B.2 Lower Bound 

Proof. We reduce 3 SAT to our problem. 

Fix the grammar G = ({S, F, T}, {a}, P, S) with productions: 

S -> SF | ST \F\T F^a T^a 

and consider an instance = f\™ =1 C% of 3SAT where each d is a disjunction 
of literals over n variables {xi, . . . , x n }. Define w = a n . 

Any parse tree t of w will have a "comb" shape of length n with ^-labeled 
nodes, each giving rise to one of F or T as a child. The parse forest is thus in 
bijection with the set of valuations of {x\ 1 . . . , x n }: if the value of variable X{ 
is 0, then in our encoding, the ith S node has a node with label F as a child; 
otherwise, it has a node with label T as a child. 

Given such an encoded valuation, our formula ip must verify that each clause 
is satisfied. For a clause d — £i,iVli,2V£i,3 with £ij — Xkj or £ij — ~^Xk i , define 
fi = y^ = i((S'-,~>) kj )(3i,j where fiij = F if £ij = -iXkj and j3ij = T otherwise. 

Finally, let ip = Al"=i V»- Then t \= ip if and only if the corresponding assignment 
of the variables is a satisfying assignment. Because G is fixed and w and if can 
be computed in space logarithmic in the size of the 3SAT instance, this shows 
the NPTiME-hardness of the PFMC problem in the acyclic e-free case. □ 
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C Model- Checking Non-Recursive DTDs: Propo- 
sition [4] 

We present in this section a proof of Proposition [4] the satisfiability of a PDL tree 
formula <p in presence of a non-recursive DTD D is PSPACE-complete. 



The lower bound is proved as Proposition 5.1 by Benedikt et al. (20081, 
and we follow their general proof plan from Lemma 7.5 for the upper bound. 
As presented in the main text, the fact that we consider a non-recursive DTD 
means that the height of any tree of interest is bounded by \N\ the number 
of nonterminals of the DTD. The proof plan is then to consider XML word 
encodings of trees, and construct two 2-way alternating parity word automata 
(2APWA) Ad and A v of polynomial size which will respectively recognize the 
XML encodings of the trees of D and of the models of ip of height bounded by 
\N\. Then, by taking the conjunction of the two automata, we reduce the initial 
satisfiability problem to a 2APWA emptiness problem, which is known to be in 



PSpace by the results of Serre (2006). 



We can find a suitable construction for an automaton An for D as Claim 7.7 



of (Benedikt et al. 2008), thus we will only present the construction of Ad- 

XML Encoding. Define the alphabet 

XML(A) = {(X), (/X) | X e N} 

and choose a fresh root symbol r not in N. We encode our a tree t as (r)stream(f )(/r) 
where the XML streaming function is defined inductively on terms by 

stream (/(ii • ■■t m )) = (/)stream(ti) ■ • • stream (t m )(/f) ■ 

2- Way Alternating Parity Word Automata. A positive boolean for- 
mula / in M + (X) over a set X of variables is defined by the syntax 

/::=T|i_|/A/|/V/. 

A subset X' C X satisfies a formula /, written X' \= /, if the formula is satisfied 
by the valuation x i-> T whenever x € X' and x _L if x £ X' \ X. 

A 2-way alternating parity word automaton is a tuple A = (Q,Y,, 5, qo, c) 
where Q is a finite set of states, £ a finite alphabet, qg £ Q an initial state, c a 
coloring from Q to a finite set of priorities C C N, and S a transition function 
from Q x I] to M + (Q x {—1, 0, 1}) that associates to a current state and current 
symbol boolean formulae on pairs (q' , d) of a new state q' and a direction d. 

A run of a 2APWA on a finite word w — a\ ■ ■ ■ a n in S* is a generally infinite 
tree with labels in Q x {1, . . . , n} holding a current state and a current position 
in w, such that the root is labeled (go, 1), and every node labeled (q, i) with has 
a children set {(gi, . . . , {q m , 4)} that satisfies 5(q, ai). A run is accepting iff 
for every branch, the smallest priority c(q) that occurs infinitely often among 
the nodes (q, i) is even — this also means in particular that any finite run is 
accepting — , and w is accepted if there exists some accepting run for it. 
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Inductive Construction. We construct A v = (Q v , S, 8 V , qo,ip, c v ) by 
induction on the subterms of the formula ip. We work with the alphabet 
S = XML(iV) W {(r), (/r)} and set n = \N\— which is the maximum height 
of any tree of D. The guiding principles in this construction is that our in- 
ductively constructed automaton will track their height relative to that of their 
starting position. Because we are working on trees of bounded depth, this can 
be achieved by considering states that combine a "control" state with a height 
in {— n, . . . , n}. 

Let us start with the base cases for node formulas: by convention, our au- 
tomata for a node formula? must check that their starting positions are labeled 
by opening tags: 

A v The automaton checks if it starts at an opening node (p). It immediately 
goes into cither an accepting or rejecting state. Formally, Q p = {q p ,o}, 
the coloring c p maps q p $ to 1, and 5(q p .a, (p)) = T and S p (q p , a ,X) = _L 
for all X ^ (p). 

A-t The automaton immediately goes into an accepting state, unless it is at a 
closing node or the root node. Formally, Q = f {<zt,o} an d c-j is defined by 
c t(<Zt,o) = 1; 6t(qo,X) is defined as T for X = (p) in XML(iV) and as _L 
otherwise. 

The automata A v for tt a path formula additionally carry a distinguished 
subset CV C of continuation states, such that there is a "partial run" from 
some initial position with branches starting from their initial state, which are 
cither infinite but verifying the parity condition, or arc finite but end in a 
continuation state in a position related to the initial one through [7r]. Let us 
see this at work with the base cases of path formulae: 

A\, The automaton moves right from the initial node while maintaining the 
depth relative to this initial node. It stops (goes into a dead state) if 
it reaches a node at the same or lesser depth than the initial node. All 
the visited nodes with a relative depth of 1 are direct children of the 
initial node, and therefore visited by continuation states. We set where 
Ql = {qo,qi, ■ ■ ■ ,q n } with q ifi = qo; the coloring ci is identically 1 on 
Ql, Ci = {qi}, and 5± is defined by: 

h(<H> <P» = O-.tfi+i). i<n,peN 5i(q„, (p)) = _L, p e N 

h(u, Up)) = (i, Qi-i), i > i,p e JV h(<io, Up)) = Up)) = -L, p e n 

Si(qi,(/r}) = _L, »e{0,...,n}. 

A—). Similarly to ^4., the automaton moves right while maintaining the depth 
relative to the initial node. It fails if it reaches a node at a lesser depth 
that the initial node. Otherwise, it finds the next node at a same depth 
as the initial node. Formally, = {qo, qi, . ■ . , q n , qf}, q^.o = qo, the 
coloring c_> is identically 1 on Qj., there is a unique continuation state 
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C-j. = {qj} when reaching the right sibling, and 5-> is defined by 

<M9i,(f>)) = i<n,peN 5^(q n , (p)) = _L, p £ N 

<M«/i></p» = (i>9i-i)> i>i,peiv M«i,</p» = (!,«/), peiv 
M«>></p» = -L> peiv M«/,*) = -L. les 

M % ,(/r)) d = f ±, ie{0,...,n}. 

, A(_ We define these automata similarly to ^ and „4_+ . Observe however 
that, because we always finish on opening brackets, it is not enough to 
exchange —1 and 1 in the directions of transitions. 

Next, we consider the induction step for path formulae: 

A 7ri;7 r 2 We combine the automata A Wl and A V2 . We add transitions from 
the continuation states of A Vl at opening nodes to the initial state of 
A W2 . Formally, Q Wi;7r2 = Q Wl W Q^, = g Wl , , preserves the 

priorities of c Tl and c W2 , C Ii;T2 = f C,r 2 and S ni;7r2 is defined by 

^TTlJTrs {Ql,X) = 6 ni (qi,X), Ql € Q^n \ C^TTl 

^7Ti ;7T 2 («i,(p» = M?i,<P»V(0, ^2,0); <Zi £ C 7ri 

U 7Ti ;7T2 (91, </p» = <M9i, (/p)), 91 e C W1 

^^2(92,^) = ^2(92,^), 92 e Q V2 

for X in S and p in iV. 
Ajn+Tra This is a straightforward union: We define Q Vl+n2 = ttl ttl 

{97ri+7r 2 ,o}) CVi+7r 2 Ctti U Cjr 2: C-7ri+7r 2 C 7ri U C.^ U { (qVi +7r 2 ,0 7 1)} 7 

u <y wa u {( g7ri+7r2i0 , X, (0, g„ li0 ) V (0, q^.o)) I e S}. 

This case is similar to that of A ni - n2 ; we add transitions from the critical 
states of A w to its own initial state. Define Q^* = Qn W {9tt*.o}i CV* == 
{9tt%o}, c^* = U {{q„*,o, 1)}, and 

^7T*(97r*,0,^) = (0,97T,0), S 7T *(q,(/p))=6 7T (q,X), 9^, 

= S n (q,X), qeQ^\C v S^(q,(p)) = 6 7T {q,X)\/(0,q 7r , i0 ), qeC^ 

for X in S and p in N . Because we assigned an odd priority to q n *fli the 
automaton cannot loop indefinitely in q v *fi and must eventually continue. 

A^? Define = ttl {<?</,? j0 , <?/}, C^? = c^,? extends c^, with c^,?(^? i0 ) = 

c ^?(9/) = lj and 

£v?(9v?,0>^0 = (°)9/) A (0,^,o), 6w(q/,X) = _L, 5^(q,X) = 6^(q,X), 
for X in £ and q in Q,/,. 

Finally, we consider the induction step for node formulae: 
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«4<7r)v We construct the automaton by joining the critical states of A^ with 

the initial state of A^. Define Q(„)^ = Qtt W Q^, q^i/ifi = q^fii C(n)^ = 
CirUc^, and by 

&(tt)4,{p,X) = S n (p,X), p G Qtt \ C-it S {7T)4 ,(q, X) = 6^(q,X), q E Q^, 

S^(p,X) = 8 n (p,X) V (0,^,o), P G C n 
for X in S. 

»4-i/)iAi/'2 We do a simple conjunction of the automata .4^ and -A,/, 2 : define 

= f Q^i ^ Qfe ^ {<7</>iAi/>2,o}: CV-iA^a = f c V>i U c </>2 U {(<2tyiA-0 2 ,Oj 1)}, 

^/>iav>2 = ^/>i u u {(9Via-02,O7 -X", (0, <fyi,o) A (0,9^2,0)) I X e £}. 

^4_,^, We essentially construct the dual dual(_4^) of A^: the latter accepts the 
complement of the language accepted by A^. However, we need to en- 
sure that only opening nodes (p) are accepted, thus intersect with the 
automaton At that only accepts opening nodes. Formally, dual (.A,/,) = 
(Q^,E,^,^,o,c-,^,) where S^(q,X) = dua\(8^(q, X)) and c-,${q) = 
Cip(q) + 1 for all q € Q and X £ E. Here, dual is a function from 
B + (Q x { — 1,0,1}) to itself that applies the usual DeMorgan's law. It 
is easy to check that dua\(A^) accepts the complement of L(A^). 

D £-Free Case 

We prove in this section propositions [6] and [7J thus showing that the PFMC 
problem is PSPACE-complete in this case. 

D.l Upper Bound: Proposition [6] 

Proof. Let G — (AT, S,P, 5), w be a string in £*, and <p be a PDL tre e formula. 
Without loss of generality, we assume G to have productions with right-parts of 
length at most 2; since G is e-free, these right-parts have length at least one. We 
want to construct a non-recursive DTD D = (N 1 , P' , S') and a PDL tree formula 
ip' s.t. the parse forest model checking problem on G, w, and ip has a solution 
iff ip is satisfiable in presence of D, thereby reducing our instance to an instance 
of a problem in PSpace by Proposition [4] 

We want to construct D from the polynomial-sized automaton Ag,w by 
removing chains of unit rules q — > A(q') of Ag,w', recall that this automaton 
uses states of form q = (i,X,j) where < i < j < \w\ and X is in V. Let for 
this Qg.w be a disjoint copy of Qg,w and define N' = Qg.w W Qg,w 

Chain Sequences. For each q in Qg,w, we consider the set of sequences 
of successive states q = qo, qi , ■ . ■ , q n we can visit using only unit rules — > 
Ai(qi + i) of Sg,w and such that q n has a binary rule q n — ¥ A n (q' q") or a nullary 
rule q n — > a() in Sg, w - More precisely, we are interested in the relabeled sequence 
q = qo,qi, ■ ■ . , q n -ii Qn °f copies gi of (fo, except on the very last position. We call 
chains(g) the language of such sequences. Formally, chains(g) is a regular lan- 
guage over N' that we can define thanks to a NFA A q = (N', N', d q , {q}, Qg,w) 
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Figure 5: The tree transformation for the proof of Proposition [6] 



with state space N' where 

S q = {(p,P,jf) \3AeN, P ^ A(p') g S G , W } 

U {(p,p,p) I 3A g N,3p u p 2 e Q G ,w,P^ A( PlP2 ) g <5 GjW } 
U {(p,p,p) I 3a G S,p -> a() G 5 G , W } ■ 

Note that A q has a size linear in that of Ag.w We can see chains as a regu- 
lar substitution from Q* G to N'* by setting chains(e) = e and chains(uu) = f 
chains(u)chains(u) for all u, v in Q* G w . 



The DTD. We can now express the productions P' of D: 

P(g) d = f P(9) d = f (J chains^ 92) U |J e. 

Thus, the symbols in Qg,w are non-productive and only employed to represent 
a chain sequence that has been transformed into a sequence of siblings in the 
DTD. Also, because any word in chains(q) for some q is of form up with u in 
Q G and p in Qg.w, any internal node in a tree of D has exactly two children 
labeled by states in Qg,w Therefore, and because G is e-free, we get that D 



is non-recursive. See Figure 5 for an illustration of the tree transformation we 
operated. 



The Formula. It remains to define a formula ip' that will be interpreted 
on the transformed trees of D. For this, we need to interpret the atomic propo- 
sitions in AP = V over the new set of labels N' , and to interpret the child I and 
sibling — > relations. 

Regarding the atomic propositions, we can interpret a label X in V as 

Y (i,X,j)V(i,X,j) (interpretation of X) 

0<i<j<\w\ 

over N' . 

Regarding the relations, we first define bar = VgeQc ^° rie ^P us differen- 
tiate between "rotated" nodes and preserved ones. We then interpret J, as a 
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disjunction of paths depending on whether we are on a rotated node, where the 
test [<— ]^bar allows to check that the current node is an "original" child: 

(^bar?; 1; [•<— ]^bar) + (bar?; — >•) (interpretation of 4-) 

For we set: 

([■<— ]^bar?); (bar?; — >)*; ^bar?; —> . (interpretation of — >•) 

The initial test prevents the nodes taken from a chain from having a right 
sibling; then the test sequence advances to the end of the chain before we make 
the actual move to the original right sibling. 

We can conclude by noting that both D and tp' can be computed in time 
polynomial in the size of the input and invoking Proposition [4] □ 

D.2 Lower Bound: Proposition [7] 

Proof. We reduce from the membership problem of linear bounded automata 
(LBA). Suppose we are given an LBA M = (Q, T, E, 6, qi, F) with state set 
Q, tape alphabet T, input alphabet E C T, transition relation S C Q x T x 
Q x r x {—1,0,1}, initial state q% E Q, and set of final states F C Q. Let 
Q = {qi, . . . , qi}; we assume that T = {a\, 02, ... , a m } contains two endmarkers 
a% = < and a 2 = > that surround the input and are never erased nor crossed 
during the run of the machine. 

We are also given a string x = 61&2 ■ • • b n with each bi G E; b\ = < and b n = >. 
We have to decide whether x is accepted by M. We are going to construct a 
word w, a CFG G, and a PDL core [4.] formula ip, s.t. the PFMC problem has a 
solution for (w, G, ip) iff M accepts x. 

Encoding as Linear Trees. A configuration of M is a sequence of length 
n of form <T/q"/'t> where q is the current state in Q, <77'> is the current tape 
contents, and | < 7I = h indicates that the head is currently on the last symbol 
of <7, i.e. the hth symbol of the tape. 

We encode such a configuration by a contiguous sequence a of nodes as 
follows: 

• The first node is S and it is followed by a sequence of n nodes, among 
which one is labeled H and the others H; the position of H in this sequence 
denotes the position of the head in the configuration of M. 

• This sequence is followed by a sequence of I nodes, one labeled C and the 
others C, which together describe the current state as qk if the occurrence 
of C is the fcth symbol in the sequence. 

• Then we encode the tape contents as n successive sequences each of length 
m of nodes, with each time one labeled A and the others A. The ith such 
sequence encodes the contents of the ith cell of the tape of M with A 
occurring at the j th position indicating that this cell contains aj . 

Thus a is of length 1 + n + £ + nm. 
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String and Grammar. We fix w = a and G = ({S, H, H, A, A, C, C}, {a}, P, S) 
with productions 

S -+ H\H 

H -> H\H\C\C C^C\C\A\A A^A\A\S\a 
H ^ H\H\C\C C ->C\C\A\A A -> A \ A \ S \ a . 

Therefore, the trees in the parse forest Lg jTO are essentially sequences over N* ■ 
{a}. Clearly, all the encodings of finite runs of any LB A M will be in this set; 
it will be the formula's task to look for an accepting run of our particular M on 
x among all these trees. 

The Formula (fM,x- Let us turn to the definition of our PDL core [J,] for- 
mula. We start by defining low-level formulae useful for testing the properties 
of the current configuration: assume we are on an S'-labeled node: 

h(ft)^(u h >H)A A 

»€{l,...,n}\W 

tests whether the head is at position h. In the same way, 

q(£:) d = f (|"}((a fc )C)A /\ (i k ')C) 

k'e{l,...J}\{k} 

then tests whether the current state is qk, and 
p(h.i) = (l n+e+im ) A) A f\ (^)A) 

j'e{l,...,m}\{j} 

tests whether the i position on the tape is symbol aj. Finally, we can go to the 
next configuration by the path 

next d =|" +£+ ™ m • 

We can now check that a parse tree of Lg, w is really the encoding of an accepting 
run of M on x. First, at each S node, we should find a full configuration: 

n £ n m 

fconf = [i*]S D \/ b(h) A \/ q(fc) A /\V p(i, j) A (next)a V S . 

h=l fe=l i=lj=l 

The initial configuration should have its head on the initial position 1, be in the 
initial state qi, and have x = b\ ■ ■ ■ b n as tape contents: 

n 

</W = SAh(l)Aq(l)A/\pM 4 )- 

i=l 

The leaf of the tree should be reached in a final configuration: 
Vfinal = [r](5A<noct)o)D \/ 
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Successive configurations should respect the transition relation: 

n t m 

Vtrans = [|* ](S A -(next)a) D \f \f \f \f (h(/i) A q(fc) A p(h, c) 

fc— 1 c— 1 (Q fc ,a c ,g fc /,a c /,d)e<5 

n m 

A( f\ \f p(i,j) A<next)p(*,i)) A (next)(h(/i + d) A q(fc') A p(/i, c'))) . 

We finally define our PDL core [4.] formula as the conjunction of the previous 
formulae: 

def 

To conclude, we observe that a tree in Lq w is a model of ipM,x iff there is an 
accepting run of M on x. As G and w are fixed and (fM,x can be computed in 
space logarithmic in the size of (M, x), this proves the PSPACE-hardness of the 
PFMC problem in the £-free case. □ 
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